44 research outputs found

    Taxonomies of regular tree algorithms

    Get PDF
    Algorithms for acceptance, pattern matching and parsing of regular trees and the tree automata used in these algorithms have many applications, including instruction selection in compilers, implementation of term rewriting systems, and model checking. Many such tree algorithms and constructions for such tree automata appear in the literature, but some deficiencies existed, including: inaccessibility of theory and algorithms; difficulty of comparing algorithms due to variations in presentation style and level of formality; and lack of reference to the theory in many publications. An algorithm taxonomy is an effective means of bringing order to such a field. We report on two taxonomies of regular tree algorithms that we have constructed to deal with the deficiencies. The complete work has been presented in the PhD thesis of the first author

    On compile time Knuth-Morris-Pratt precomputation

    Get PDF
    Many keyword pattern matching algorithms use precomputation subroutines to produce lookup tables, which in turn are used to improve performance during the search phase. If the keywords to be matched are known at compile time, the precomputation subroutines can be implemented to be evaluated at compile time versus at run time. This will provide a performance boost to run time operations. We have started an investigation into the use of metaprogramming techniques to implement such compile time evaluation, initially for the Knuth-Morris-Pratt (KMP) algorithm. We present an initial experimental comparison of the performance of the traditional KMP algorithm to that of an optimised version that uses compile time precomputation. During implementation and benchmarking, it was discovered that C++ is not well suited to metaprogramming when dealing with strings, while the related D language is. We therefore ported our implementation to the latter and performed the benchmarking with that version. We discuss the design of the benchmarks, the experience in implementing the benchmarks in C++ and D, and the results of the D benchmarks. The results show that under certain circumstances, the use of compile time precomputation may significantly improve performance of the KMP algorithm

    On minimizing deterministic tree automata

    Get PDF
    We present two algorithms for minimizing deterministic frontier-to-root tree automata (dfrtas) and compare them with their string counterparts. The presentation is incremental, starting out from definitions of minimality of automata and state equivalence, in the style of earlier algorithm taxonomies by the authors. The first algorithm is the classical one, initially presented by Brainerd in the 1960s and presented (sometimes imprecisely) in standard texts on tree language theory ever since. The second algorithm is completely new. This algorithm, essentially representing the generalization to ranked trees of the string algorithm presented by Watson and Daciuk, incrementally minimizes a dfrta. As a result, intermediate results of the algorithm can be used to reduce the initial automaton’s size. This makes the algorithm useful in situations where running time is restricted (for example, in real-time applications). We also briefly sketch how a concurrent specification of the algorithm in CSP can be obtained from an existing specification for the dfa case

    On minimizing deterministic tree automata

    Get PDF
    We present two algorithms for minimizing deterministic frontier-to-root tree automata (dfrtas) and compare them with their string counterparts. The presentation is incremental, starting out from definitions of minimality of automata and state equivalence, in the style of earlier algorithm taxonomies by the authors. The first algorithm is the classical one, initially presented by Brainerd in the 1960s and presented (sometimes imprecisely) in standard texts on tree language theory ever since. The second algorithm is completely new. This algorithm, essentially representing the generalization to ranked trees of the string algorithm presented by Watson and Daciuk, incrementally minimizes a dfrta. As a result, intermediate results of the algorithm can be used to reduce the initial automaton’s size. This makes the algorithm useful in situations where running time is restricted (for example, in real-time applications). We also briefly sketch how a concurrent specification of the algorithm in CSP can be obtained from an existing specification for the dfa case

    Weak factor automata : the failure of failure factor oracles?

    Get PDF
    In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One e cient, compact representation is the factor oracle (FO). At the same time, any classical deterministic nite automaton (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number of transitions for many DNA sequences of lengths 4 - 512, showing gains of up to 10% in total number of transitions, with failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.http://www.journals.co.za/ej/ejour_comp.htmlam201

    Deriving the Boyer-Moore-Horspool algorithm

    No full text
    The Post-Proceedings of this Festschrift will be formally published in The South African Computer Journal number 41

    A Boyer-Moore-Horspool algorithm derivation

    No full text
    The keyword pattern matching problem has been frequently studied, and many different algorithms for solving it have been suggested. Watson and Zwaan in the early 1990s derived a set of well-known solutions from a common starting point, leading to a taxonomy of such algorithms. Their taxonomy allowed one to easily compare the different algorithms, and to easily implement them in a taxonomy-based toolkit. The taxonomy and toolkit did not include a variant of the Boyer-Moore algorithm developed by Horspool. In this paper, we present the Boyer-Moore-Horspool algorithm and its multiple keyword generalization in the context of the taxonomy

    Taxonomy-based software construction of SPARE time : a case study

    No full text

    Combining regular expressions with near-optimal automata

    No full text
    Derivatives of regular expressions were first introduced by Brzozowski in [1]. By recursively computing all derivatives of a regular expression, and associating a state with each unique derivative, a deterministic finite automaton can be constructed. Convergence of this process is guaranteed if uniqueness of regular expressions is recognized modulo associativity, commutativity, and idempotence of the union operator. Additionaly, through simplification based on the identities for regular expressions, the number of derivatives can be further reduced
    corecore